Object detection device, object detection server, and object detection method

ABSTRACT

A tag communication section ( 12 ) receives tag information transmitted from an information tag ( 11 ) attached to a person (P) to be detected. An attribute lookup section ( 15 ) looks up an attribute storage section ( 16 ) using ID included in the tag information to obtain attribute information such as the height of the person (P). A target detection section ( 14 ) specifies the position, posture and the like of the person (P) in an image obtained from an imaging section ( 13 ) using the attribute information.

TECHNICAL FIELD

The present invention relates to a technology of detecting the accurateposition, posture and the like of a given detection target, such as aperson and an object, in an image.

BACKGROUND ART

Patent literature 1 below discloses a technology of controlling theorientation and zoom of a monitor camera in response to an approach of atransmitter. Specifically, a radio transmitter transmitting an ID codeis attached to a person to be monitored. An antenna for detecting anapproach of a transmitter is placed in an off-limits area. Once anapproach of a transmitter is detected via the antenna, a monitor cameracapable of capturing the surroundings of the antenna is automaticallyselected among a plurality of monitor cameras, and an image taken withthe selected camera is displayed on a monitor. In addition, the ID codetransmitted from the transmitter is read via the antenna, and based onthe height of the person associated in advance with the ID code, theorientation and zoom of the monitor camera are determined.

(Patent Literature 1) Japanese Laid-Open Patent Publication No. 9-46694

Problems to be Solved

In recent years, with proliferation of the Internet, use of a monitoringsystem has started in which a monitor camera is connected to theInternet to enable transmission of images taken with the monitor camera.Such a system costs low and is easy in placement of a monitor camera,compared with a system using an exclusive line. In even such a system,however, an operator for monitoring images is still necessary. In thefuture, therefore, a technology permitting not only automatic capturingof a target to be monitored but also automatic retrieval of usefulinformation from images is desired.

Also, with the recent advance of the robot technology, achievement of ahome robot for assisting human lives is expected. Such a robot must havethe function of detecting the situation surrounding itself and acting inharmony with the surrounding situation. For example, to properly move ina house and work for a person and an object, a robot must detect theposition, posture and motion of a person and an object surroundingitself accurately. Otherwise, the robot will not be able to move andwork accurately, much less to assist human lives.

With proliferation of portable cameras such as hand-held video cameras,digital cameras and camera-equipped mobile phones, it is increasinglydesired that even an unskilled user could photograph a subject properly.For this, it is important to detect the position and the like of atarget accurately.

However, the prior art described above finds difficulty in responding tothe needs described above. Specifically, the above prior art merelyselects a camera capturing a detection target in response to an approachof a transmitter. The prior art falls short of acquiring detailedinformation such as where the detection target is in a captured imageand what posture the target takes. In addition, with use of an antennato locate the position, a comparatively large error (several meters tohigh-teen meters) occurs in position information. Therefore, if anotherperson is present near the detection target, it is difficult todistinguish one from the other in an image.

There are conventionally known some technologies of detecting an objectfrom a camera image by only image processing. However, most of thesetechnologies cause frequent occurrence of miss detection in theenvironment in which humans normally live, although being usable undervery restricted conditions, and thus application of these technologiesis difficult. Reasons for this are that the dynamic range of a cameraitself is limited, that various objects other than the detection targetand the background exist, and that the image of one target at the sameposition may change in various ways with change of sunshine andillumination.

The visual mechanism of the human can detect a target accurately even inan environment having a large change by utilizing an enormous amount ofknowledge and rules acquired from experience. Currently, it is greatlydifficult to incorporate knowledge and rules as those acquired by thehuman in equipment. In addition, an enormous processing amount andmemory amount will be necessary to achieve such processing, and thiswill disadvantageously increase the processing time and the cost.

In view of the above problems, an object of the present invention isproviding an object detection technology permitting precise detection ofthe position, posture and the like of a detection target in an imagewithout requiring an enormous processing amount.

DISCLOSURE OF THE INVENTION

The object detection equipment of the present invention includes: animaging section for taking an image; a tag communication section forreceiving tag information transmitted from an information tag attachedto a given target; and a target detection section for detecting thegiven target in the image taken by the imaging section using the taginformation received by the tag communication section.

Accordingly, the target detection section uses tag informationtransmitted from an information tag attached to a given target indetection of the given target in an image taken by the imaging section.That is, information unobtainable from the image can be obtained fromthe tag information or retrieved based on the tag information, and suchinformation can be utilized for image processing. Therefore, a giventarget can be detected precisely from an image without requiring anenormous processing amount.

In the object detection equipment of the present invention, preferably,the tag information includes attribute information representing anattribute of the given target, and the target detection section performsthe detection using the attribute information included in the taginformation received by the tag communication section.

In the object detection equipment of the present invention, preferably,the tag information includes ID information of the given target, theobject detection equipment further includes: an attribute storagesection for storing a correspondence between ID information andattribute information; and an attribute lookup section for looking upcontents of the attribute storage section using the ID informationincluded in the tag information received by the tag communication, toobtain attribute information of the given target, and the targetdetection section performs the detection using the attribute informationobtained by the attribute lookup section.

Preferably, the target detection section includes: an image segmentationportion for determining a partial image area having the possibility ofincluding the given target in the image; and an image recognitionportion for detecting the given target in the partial image areadetermined by the image segmentation portion, and at least one of theimage segmentation portion and the image recognition portion performsprocessing by referring to the attribute information.

In the object detection equipment of the present invention, preferably,the tag information includes position information representing aposition of the information tag, and the target detection sectionperforms the detection by referring to the position information includedin the tag information received by the tag communication section.

In the object detection equipment of the present invention, preferably,the tag communication section estimates a position of the informationtag from a state of reception of the tag information, and the targetdetection section performs the detection by referring to the positionestimated by the tag communication section.

In the object detection equipment of the present invention, preferably,the tag information includes a detection procedure for the given target,and the target detection section performs the detection by executing thedetection procedure included in the tag information received by the tagcommunication section.

In the object detection equipment of the present invention, preferably,the target detection section performs the detection with only imageprocessing without use of the tag information when a reception state ofthe tag communication section is bad.

The object detection server of the present invention receives an imagetaken by an imaging section and tag information transmitted from aninformation tag attached to a given target, and detects the given targetin the image using the tag information.

The object detection method of the present invention includes the stepsof: receiving an image taken by an imaging section; receiving taginformation transmitted from an information tag attached to a giventarget; and detecting the given target in the image using the taginformation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram showing a configuration of object detectionequipment of Embodiment 1 of the present invention.

FIG. 2 is a block diagram conceptually showing an inner configuration ofan information tag in Embodiment 1 of the present invention.

FIG. 3 is a flowchart showing a flow of processing of object detectionin Embodiment 1 of the present invention.

FIG. 4 is a flowchart showing an example of processing in step S4 inFIG. 3.

FIG. 5 is a view showing a situation of the object detection inEmbodiment 1 of the present invention.

FIG. 6 shows an example of an image taken by an imaging section in thesituation of FIG. 5.

FIG. 7 is a view showing candidate areas determined by an imagesegmentation portion in the image of FIG. 6.

FIG. 8 is a view showing a candidate area determined using positioninformation obtained from the information tag, in the image of FIG. 6.

FIG. 9 is a view showing an area determined from the candidate areasshown in FIGS. 7 and 8.

FIG. 10 shows an example of information used for generation of atemplate.

FIG. 11 is a view showing motion trajectories obtained from an image.

FIG. 12 is a view showing a motion trajectory based on the positioninformation obtained from the information tag.

FIG. 13 is a view showing an example of placement of a plurality ofcameras.

FIG. 14 is a flowchart showing a method of determining coordinatetransformation of transforming spatial position coordinates of adetection target to image coordinates by camera calibration.

FIG. 15 is a view showing a situation of object detection in Embodiment2 of the present invention.

FIG. 16 shows an example of an image taken by an imaging section in thesituation of FIG. 15.

FIG. 17 is a view showing candidate areas determined by an imagesegmentation portion in the image of FIG. 16.

FIG. 18 is a view showing a candidate area determined using positioninformation obtained from an information tag in the image of FIG. 16.

FIG. 19 is a view showing an area determined from the candidate areasshown in FIGS. 17 and 18.

FIG. 20 is a view showing a situation of object detection in Embodiment3 of the present invention.

FIG. 21 is a block diagram showing a configuration example of a camera50 in FIG. 20.

FIG. 22 is a flowchart showing a flow of processing in Embodiment 3 ofthe present invention.

FIG. 23 shows an example of an image obtained during the processing inFIG. 22.

FIG. 24 is an example of a human-shaped template used during theprocessing in FIG. 22.

FIG. 25 is a flowchart showing an example of procedure of switching ofthe detection processing.

BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described withreference to the drawings. Note that a component common in a pluralityof drawings is denoted by the same reference numeral, and the detaileddescription thereof may be omitted in some cases.

(Embodiment 1)

FIG. 1 is a block diagram showing a configuration of object detectionequipment of Embodiment 1 of the present invention. In the configurationof FIG. 1, a person P is a detection target. Referring to FIG. 1, a tagcommunication section 12 receives tag information transmitted from aninformation tag 11 attached to the person P. An imaging section 13 takesimages. A target detection section 14 detects the person P in the imagetaken by the imaging section 13 using the tag information received bythe tag communication section 12. The imaging section 13 is placed at aposition at which images including the person P can be taken.

FIG. 2 is a block diagram conceptually showing an internal configurationof the information tag 11. Referring to FIG. 2, a communication portion110 communicates with the tag communication section 12 in a noncontactmanner via a radio wave, an acoustic wave, light or the like, and sendspredetermined tag information. A storage portion 111 stores attributeinformation (height, age, sex and the like), ID information and the likeof the person P to which the information tag 11 is attached, forexample, as tag information. A position detection portion 112 detectsthe position of the information tag 11 with a positioning system usingan artificial satellite such as the Global Positioning System (GPS), forexample, and outputs the result as tag information. The information tag11 transmits tag information output from its storage portion 111 andposition detection portion 112 via its communication portion 110.

The information tag 11 may be attached to a mobile phone carried by theperson P, for example. Both the storage portion 111 and the positiondetection portion 112 may be used, or either one of them may be used. Asa positioning system for providing the position information other thanthe GPS, a system based on measurement of the distance of a mobile phoneor a PHS (Personal Handy-phone System) from its base station, forexample, may be used.

The configuration of FIG. 1 also includes: an attribute storage section16 for storing the correspondence between ID information and attributeinformation; and an attribute lookup section 15 for obtaining attributeinformation of the person P by looking up the stored contents of theattribute storage section 16 using the ID information included in thetag information received by the tag communication section 12.

The ID information as used herein refers to a code or a mark associatedin advance with each target. The ID information may be given for eachindividual as the target or for each category the target belongs to. Forexample, in the case that the target is a person as in this embodiment,ID information unique to each person may be given, or ID informationunique to each family and common to the members of the family may begiven. In the case that the target is an object such as a pen, forexample, ID information unique to each pen may be given, or IDinformation unique to each color and shape and common to pens having thesame color and shape may be given.

The target detection section 14 detects the person P from images takenby the imaging section 13 using the tag information received by the tagcommunication section 12 and/or the attribute information obtained bythe attribute lookup section 15. The target detection section 14includes an image segmentation portion 141 and an image recognitionportion 142.

The image segmentation portion 141 determines a partial image areahaving the possibility of including the person P in an image taken bythe imaging section 13. When the GPS or the like is used for detectionof the position of the information tag 11 as described above, a largeerror is less likely to occur. However, it is still difficult to attainhigh-precision position detection and acquire detailed information suchas the posture and face position of a person. This is due to theoreticallimitation of the precision of a sensor, influence of an error anddisturbance in the actual use environment, actual limitation of thenumber of sensors usable and other reasons. As for the image processing,the partial image area can be determined with comparatively highprecision under idealistic conditions (situation where the illuminationlittle varies or changes and only a limited object is captured, forexample). However, miss detection is likely to occur in the situationwhere various objects exist and the illumination varies, as in imagestaken outdoors.

The image segmentation portion 141 uses the tag information and theattribute information in integration with the image, so that the partialimage area can be determined with higher precision. A plurality ofpartial image areas may be determined. Such an area may not be detectedif there is no possibility of existence of a detection target.

The image recognition portion 142 detects whether or not the person Pexists in the partial image area determined by the image segmentationportion 141 and, if the person P exists, detects the position, postureand motion of the person P. As described above, a large error is lesslikely to occur in the position information from the information tag 11,but it is difficult to acquire high-precision information and detailedinformation from the position information. As for use of only imageprocessing, it is difficult to detect unspecified many persons with highprecision. In view of these, the image recognition portion 142 uses theheight of the person P, for example, as the tag information or theattribute information, and this improves the precision of imagerecognition processing. For example, when template matching is adoptedas the recognition technique, the size of a template may be setaccording to the height of the person P. This improves the detectionprecision, and also reduces the processing amount because the templateused for the recognition processing can be narrowly defined.

As described above, in the configuration of FIG. 1, by using the taginformation received by the tag communication section 12 and theattribute information obtained by the attribute lookup section 15 inintegration with an image, the person P as a given target can bedetected in the image with high precision while suppressing increase ofthe processing amount.

FIG. 3 is a flowchart showing a flow of object detection processing inthis embodiment. In this flow, the processing using an image (S2, S3 andS4) and the information acquiring processing using an information tag(S5, S6 and S7) may be performed in parallel with each other. Referringto FIG. 3, first, an instruction of detection of a given target isissued (S1). The target may be designated by the user of the system orby a person to be detected himself or herself Alternatively, the targetmay be automatically designated according to the time and the place, orall of targets of which tag information can be obtained from respectiveinformation tags may be designated.

Thereafter, a camera supposed to capture the detection target isspecified (S2). When a plurality of cameras are used as will bedescribed later, all of cameras capable of capturing the target areselected. In this example, a camera having the possibility of capturingthe detection target may be designated in advance, or a camera may beselected using the position information of an information tag obtainedin step S7 to be described later. Images are then obtained from thecamera specified in the step S2 (S3).

An information tag associated with the designated target is specified(S5). Tag information is then obtained from the specified informationtag (S6), and information on the target is acquired from the obtainedtag information (S7). The acquired information includes the attributeinformation such as the height, age and sex read from the attributestorage section 16 and the position information and the like included inthe tag information, for example. Finally, detailed information such asthe position of the target is specified in the image obtained in thestep S3 using the information acquired in the step S7 (S4).

FIG. 4 is a flowchart showing an example of processing in the step S4.First, motion vectors in an image are computed (S41). A set M of imageareas within which the directions of vectors are the same in the imageis determined using the motion vectors (S42). An area of which theposition and motion match with the position and motion of the targetacquired from the information tag is then selected from the image areaset M as a set Ms (S43). A template H in the shape of a human is thengenerated to reflect the attributes such as the height, age and sexacquired from the information tag (S44). Shape check using thehuman-shaped template H generated in the step S44 is conducted in theimage area set Ms (S45). Finally, the position giving the highest degreeof matching in the shape check in the step S45 is detected as theposition of the target (S46).

The processing in this embodiment will be described in more detail withreference to FIGS. 5 to 9.

In FIG. 5, the tag communication section 12 is in a communication center30, while the target detection section 14, the attribute lookup section15 and the attribute storage section 16 are in a monitoring center 31.Assume herein that a given person Pa carrying an information tag 11(attached to a mobile phone 33) is detected with a camera (imagingsection 13) placed outdoors on the side of a street. The information tag11 transmits ID information of the person Pa and rough GPS positioninformation (error of about 10 m). In the communication center 30, thetag communication section 12 receives the information transmitted fromthe information tag 11. The monitoring center 31 as the object detectionserver obtains an image from the camera 13 and the tag informationreceived in the communication center 30 via a communication network. Theattribute storage section 16 stores attribute information such as theheight, age and sex associated with ID information.

Assume that currently a person Pb and a car C exist near the person Paand that the person Pa is moving toward the imaging section 13 while theperson Pb and the car C are moving away from the imaging section 13. Atthis moment, an image as shown in FIG. 6 is taken with the camera 13, inwhich the persons Pa and Pb and the car C are captured.

First, the image segmentation portion 141 of the target detectionsection 14 executes the steps S41 to S43 shown in FIG. 4, to determine apartial image area having the possibility of including the person Pa.

Specifically, motion vectors in the image of FIG. 6 are computed, todetermine image areas within which the directions of vectors are thesame. FIG. 7 shows the areas obtained in this way. In FIG. 7, areas APa,APb and AC are formed at and around the positions of the persons Pa andPb and the car C, respectively.

An area of which the position and motion match with the position andmotion of the person Pa obtained from the tag information is thenselected from the areas shown in FIG. 7. FIG. 8 shows a candidate areadetermined when only the position information obtained from theinformation tag 11 is used. In FIG. 8, the position specified by theposition information is transformed to the position on the camera image,which is shown as an area A1. Since the position information from theGPS includes an error, a large area including the persons Pa and Pb isgiven as the candidate area A1 considering a possible error. From FIGS.7 and 8, the overlap areas, that is, the areas APa and APb areconsidered as candidate areas. Thus, the area AC of the car C can beexcluded by using the position information of the person Pa.

Further, from a change in position information, it is found that thetarget is moving forward. Therefore, by checking the directions of themotion vectors in the areas APa and APb, the candidate area can benarrowed to only the area APa as shown in FIG. 9.

Thereafter, the image recognition portion 142 of the target detectionsection 14 executes the steps S44 to S46 in FIG. 4, to specify theposition and posture of the person Pa in the image. That is, templatematching is performed for the area APa to detect the position and motionof the person Pa more accurately.

Use of an unnecessarily large number of templates will increase thepossibility of erroneously detecting a person/object different from thedetection target, and also will require a vast amount of processing formatching. Therefore, the template used for the matching is narrowlydefined according to the attribute information such as the height, ageand sex obtained by the attribute lookup section 15. In this way, thedetection precision can be improved, and the processing amount can bereduced.

An example of how the template is narrowly defined will be described.FIG. 10 shows an example of information used for generation oftemplates, in which the height range, the average body type and theaverage clothing size are shown for each of age brackets (child, adult,senior). Assume that the age of the person Pa is found to be “20” fromthe attribute information. Since this age belongs to the age bracket“adult”, the average height and body type can be obtained from therelationship shown in FIG. 10. The obtained height and body type aretransformed to the size and shape on the camera image, to therebygenerate a shape template used for the matching.

When the height range has a certain width, a plurality of templatesdifferent in size may be generated within the height range. If the exactheight of a target is directly obtainable as attribute information, thisheight value may be used. If the information tag is attached to theclothing of the target and the size of the clothing is obtained as taginformation, the height of the target can be obtained from therelationship shown in FIG. 10, to generate a template.

Once the accurate position of the person Pa in the area APa is detectedfrom the template matching, the change of the position with time may befollowed, to obtain the accurate motion of the person Pa. If shapetemplates of different postures are used, the posture of the person Pacan also be detected. Each template used for the matching may be agray-scale image or a binary image representing the shape or a colorimage including a color, or else may be an image merely representing theoutline.

Once the accurate position of the person Pa is successfully detected inthe image, the camera can zoom in on the person Pa to capture only theperson Pa, for example. Alternatively, the zooming rate may be loweredso that the surroundings of the person Pa are captured at all times.This enables continuous monitoring of not only the state of the personPa but also how the person Pa comes into contact with neighboringpersons. This will be useful for crime investigation, behaviorinvestigation and the like.

As described above, in this embodiment, in detection of a given targetPa in an image, tag information transmitted from the information tag 11attached to the target Pa is used. In this way, information that willnot be obtainable from the image can be obtained from the taginformation itself (for example, attribute information and positioninformation transmitted from the information tag 11) or can be retrievedusing the tag information (for example, attribute information read fromthe attribute storage section 16), and such information can be utilizedin image processing. This enables detection of a given target in animage without requiring an enormous processing amount.

In this embodiment, both the image segmentation portion 141 and theimage recognition portion 142 refer to the information from the tagcommunication section 12 and the attribute lookup section 15. Instead,either one of the image segmentation portion 141 and the imagerecognition portion 142 may refer to the information from the tagcommunication section 12 and the attribute lookup section 15, and theother may perform the processing using only the image information.

In this embodiment, the information tag 11 is attached to a mobile phonecarried by the person Pa. Alternatively, the information tag 11 may beattached to other portable equipment such as a PDA. The information tag11 may otherwise be attached to something carried by a target personsuch as a stick and a basket, something moving together with a targetperson such as a wheelchair and a shopping cart, or something worn by atarget person such as clothing, glasses and shoes. The information tag11 may even be embedded in the body of a target person.

In this embodiment, the information tag 11 transmits attributeinformation stored in the storage portion 111 and position informationdetected by the position detection portion 112. The information to betransmitted is not limited to these. For example, the information tag 11may transmit information on the motion of the detection target obtainedusing an acceleration sensor, a compass, a gyro device and the like, ormay transmit information on the posture of the detection target obtainedusing a gyro device and the like. In the case of transmission of theposture, the template used for the recognition by the image recognitionportion 142 can be limited to a shape template having a specificposture. This enables further improvement in detection precision andreduction in processing amount.

The attribute information stored in the information tag 11 and theattribute storage section 16 is not limited to that described in thisembodiment. For example, when a person is a target to be detected, theskin color and the hair color may be stored as attribute information.The image segmentation portion 141 may detect an area having a specificcolor, or the image recognition portion 142 may use a templatereflecting a specific skin color or hair color. By this processing,improvement in detection precision is expected.

History information on a detection target, such as the detection time,the detection position, the motion speed and the clothing, may be storedas attribute information. By comparing such history information withinformation currently obtained, whether or not the detection result hasan abnormality can be determined.

The detection target is not limited to a person, but may be a pet or acar, for example. When a pet is to be detected, an information tag maybe attached to a collar and the like. By capturing with a cameraindoors, detailed images of a pet can be obtained and the behavior ofthe pet can be grasped from a distant place. In this case, if the stateof a pet is to be checked with a small display screen such as that of amobile phone, an image covering the entire room will fail to give agrasp of the state of the pet. According to the present invention, inwhich the position of the pet in an image can be accurately detected,only an image area including the pet can be displayed on the smalldisplay screen. Thus, the state of the pet can be easily grasped.

In the case of monitoring cars with an outdoor monitoring system and thelike, information tags may be attached to cars in advance. With theinformation tag, the position of a specific car can be detectedaccurately in a monitoring image, and thus an image of the driver of thecar can be easily acquired automatically, for example. This can be usedfor theft prevention and the like.

<Use of Motion Trajectory>

In the example described above, in the integrated use of the imageinformation and the information obtained from the information tag, thecandidate area was narrowed using the position and motion direction ofthe detection target. The present invention is not limited to this, butthe motion trajectory, for example, may be used, as will be describedbelow.

Assume in this case that, in the environment shown in FIG. 5, the car Cdoes not exist and both the person Pa carrying the information tag 11and the person Pb are walking toward the camera 13 although the walkingtrajectories are different between the persons Pa and Pb.

FIG. 11 shows an image taken with the camera 13 in the above situation.In FIG. 11, motion trajectories TPa and TPb obtained from images areshown by the solid arrows. Although the image processing can providedetailed motion trajectories, it finds difficulty in distinguishing theperson Pa from the person Pb.

FIG. 12 shows a motion trajectory T11 based on the position informationobtained from the information tag 11. With low precision of the positioninformation, the error range of the position is represented by the widthof the arrow in FIG. 12. Although the precision is low, the positioninformation can provide the outline of the motion trajectory.

The motion trajectory T11 in FIG. 12 is compared with the trajectoriesTPa and TPb in FIG. 11 to determine the similarity. In the illustratedexample, the trajectory TPa is higher in the similarity to the motiontrajectory T11. Therefore, the person Pa is specified as the detectiontarget, and the accurate position of the person Pa in the image isobtained.

As described above, by use of the similarity of motion trajectories,persons and objects comparatively similar in position and motiondirection to each other can be distinguished from each other. Thesimilarity of motion trajectories can be determined by calculating theproportion of the range of overlap of the trajectories, comparing thelengths of the trajectories, comparing the positions at which thetrajectories change the direction, or comparing motion vector series,for example.

<Use of a Plurality of Cameras>

Although one camera was used in the example described above, it isneedless to mention that a plurality of cameras may be used. Forexample, as shown in FIG. 13, in monitoring in an out-of-sight place, aplurality of cameras C1 to C3 may be placed to prevent existence of ablind spot. In a cranked path as shown in FIG. 13, a person P will falloutside the image taken with the camera C1 only by moving a few stepsrightward. Therefore, even with the camera placement free from a blindspot, if the accurate position of the person P is unknown, it will bedifficult to select a suitable camera to follow the person P. Byadopting the present invention, an out-of-sight place can be monitoredover a wide area, and the position of the person P can be specified inimages.

In other words, a camera capable of capturing the person P can bespecified based on the position information obtained from theinformation tag. When the person P is at the position shown in FIG. 13,a camera image giving the largest figure of the person P, out of theimages from the cameras C1 and C2, can be displayed automatically bydetecting the person P using the tag information.

<Linking of Position Information of Information Tag with ImageCoordinates>

For realization of the present invention, the position indicated by theposition information of an information tag 11 must be linked in advancewith coordinates in a camera image. This will be described briefly.

The position information of an information tag 11 is linked with imagecoordinates using a coordinate transformation T for transformingposition coordinates (world coordinates) in the three-dimensional spacein which a detection target exists to coordinates in a camera image. Bydetermining the coordinate transformation T in advance, it is possibleto link the position coordinates of an information tag with coordinatesin an image.

In general, the coordinate transformation T may be theoreticallycomputed based on the layout of a camera (the focal distance of a lens,the lens distortion characteristic, the size and number of pixels of animaging device) and the conditions of placement of the camera (theposition and posture of the camera), or may be determined with aprocedure of camera calibration to be described later. When the cameralayout and the camera placement conditions are known, the coordinatetransformation T can be determined by doing a combined calculationincluding geometric transformation and the like. When the camera layoutand the camera placement conditions are not known, the coordinatetransformation T can be determined by camera calibration.

A method for determining the coordinate transformation T by cameracalibration will be described with reference to the flow shown in FIG.14. Assume in this description that the position, posture and zoom ofthe camera are fixed. First, at least six sets of position coordinatesin a three-dimensional space in which a detection target exists (worldcoordinates) and the corresponding position coordinates in an image(image coordinates) are prepared (E11). A linear transformationsatisfying the correspondence of the sets of the coordinates prepared inthe step E11 is determined by the method of least squares and the like(E12). Parameters of the linear transformation computed (cameraparameters) are stored (E13).

The stored camera parameters may be used for subsequent transformationof the position coordinates of an information tag to image coordinates.

When the position, posture and zoom of the camera is changed, acoordinate transformation for the state after the change may be preparedagain by the camera calibration. If a sensor is separately provided todetect the position, posture and zoom (focal distance of a lens) of acamera, new parameters can be determined by calculation. For a camera ofwhich the position and posture frequently change, such as a cameraplaced in a mobile unit like a robot and a car, it is desirable todetect the position and posture of the camera with a separate sensor,and determine camera parameters by calculation every time a change isdetected.

(Embodiment 2)

In Embodiment 2 of the present invention, assume that a camera-equippedmovable robot detects an object as a given detection target.

FIG. 15 shows a situation in this embodiment. Referring to FIG. 15, amovable robot 40 is placed on a floor FL in a house. The robot 40 asobject detection equipment includes a camera 13 as the imaging section,as well as the tag communication section 12, the target detectionsection 14 and the attribute lookup section 15 described inEmbodiment 1. The attribute storage section 16 is placed at a positiondifferent from the robot 40 so that the attribute lookup section 15 canrefer to attribute information stored in the attribute storage section16 via radio communication.

A cylindrical object Oa in a fallen state and a spherical object Ob,which are both red, are on the floor FL. An information tag 11 isembedded in each of the objects Oa and Ob and transmits ID informationas tag information. Antennas 43 a to 43 d are placed at the four cornersof the floor FL, to allow the information transmitted from theinformation tag 11 to be received by the tag communication section 12via an antenna 42 of the robot 40 by way of the antennas 43 a to 43 d.The attribute lookup section 16 reads the shape and color of the objectfrom the attribute storage section 16 as attribute information. The tagcommunication section 12 estimates a rough position of the informationtag 11 from the ratio of the reception intensity among the antenna 43 ato 43 d placed at the four corners.

Assume that the robot 40 moves, catches hold of the object Oa or Ob as agiven detection target with its hand and moves the object. To catch holdof an object with a hand 41, the robot 40 must accurately detect theposition, shape and orientation of the object.

FIG. 16 shows an image taken with the camera 13 in the situation shownin FIG. 15. Existence of two kinds of objects is recognized by the IDinformation from the information tags 11. Rough positions of theinformation tags 11 are estimated from the ratio of the receptionintensity among the antennas 43 a to 43 d. In this case, therefore, theID information and the radio intensity are used as information from theinformation tag 11.

Assume that as for one of the two kinds of objects, the object Oa, forexample, attribute information that the shape is cylindrical and thecolor is red has been obtained by looking up the attribute storagesection 16 using the ID information. In this case, the imagesegmentation portion 141 (not shown in FIG. 15) of the target detectionsection 14 determines a candidate area of being red in an image based onthe attribute information that the detection target is red. FIG. 17 isan image showing the result of the determination, in which two candidateareas BOa and BOb respectively corresponding to the red objects Oa andOb are shown.

A candidate area B1 as shown in FIG. 18 is obtained based on theposition information of the object Oa estimated from the receptionintensity. The candidate areas BOa and BOb in FIG. 17 and the candidatearea B1 in FIG. 18 are integrated to obtain an area B2 as shown in FIG.19. In this way, the position of the object Oa is accurately obtained.

The image recognition portion 142 (not shown in FIG. 15) of the targetdetection section 14 generates shape template images having variousorientations a cylindrical object can take, based on the attributeinformation that the object is cylindrical. Template matching isperformed for the image of the area B2 using these shape templates, andthe orientation (posture) of the object can be accurately determinedfrom the orientation of an object corresponding to the shape templategiving the highest degree of matching.

The position and posture of the other object Ob can also be accuratelydetected in the manner described above. As a result, the robot 40 canobtain information necessary for moving the objects Oa and Ob with thehand 41. In other words, with the color and shape obtained as attributeinformation, image segmentation and image recognition can be performedfor a specific color and shape, and in this way, highly precise,efficient object detection can be realized.

As described above, by effectively using information obtained from aninformation tag 11 attached to an object and information of an image,the position and posture of the object can be detected with highprecision. Moreover, the processing amount required for the detectioncan be kept low. This makes it possible to realize object detectionrequired for a robot to work in a complicate environment such as in aroom of a house.

Accurate and real-time processing is desired in detection of a target bya robot. Therefore, a technology permitting improvement in detectionprecision while keeping the processing amount from increasing as thataccording to the present invention is effective. For example, for robotsused for rescue operation, care and the like, a delay of processing maypossibly lead to threatening of a human life or injury. Also, assume acase that a robot is intended to come into contact with a specificperson among a number of persons. Even if the specific person issuccessfully specified accurately, the person will pass by the robotwhile the robot is still executing the processing if the processing timeis excessively long. In such a case, also, by adopting the presentinvention, the target can be detected in an image swiftly andaccurately.

In the example described above, the relationship between the IDinformation and the attribute information was stored in the attributestorage section 16. Alternatively, the attribute information may bedirectly stored in the information tag 11 attached to an individualobject. By this direct storage, the attribute storage section and theattribute lookup section can be omitted, and thus the system layout canbe simplified. On the contrary, when the attribute storage section 16 isprovided, the memory capacity of the information tag 11 can be smalleven when the information amount used for detection increases. Thisenables reduction of the size and cost of the information tag 11. Also,the communication capacity between the information tag 11 and the tagcommunication section 12 can be kept from increasing.

A detection procedure itself may be recorded in the information tag 11.For example, in the illustrated example, information on a detectionprocedure such as “detecting a red object” and “producing cylindricalshape templates and performing shape matching” may be transmitted fromthe information tag 11 attached to the object Oa. Alternatively,information on a detection procedure may be stored in the attributestorage section 16 in association with the ID information, so as to beread from the attribute storage section 16 based on the ID informationreceived from the information tag 11. In these cases, the targetdetection section 14 simply executes processing according to thedetection procedure received from the information tag 11 or read fromthe attribute storage section 16. This can simplify a detection programinstalled in the robot itself. Moreover, even when a different kind ofobject is additionally provided, no change is necessary for thedetection program installed in the robot, and thus the maintenance canbe simplified.

(Embodiment 3)

In Embodiment 3 of the present invention, a person as a subject isdetected with a portable camera as a given detection target. Theportable camera as used herein includes a hand-held video camera, adigital camera, a camera-equipped mobile phone and an informationterminal.

FIG. 20 is a view showing a situation in this embodiment, in which aperson Pd is photographed with a portable camera 50 having the imagingsection 13 outdoors. The person Pd carries an information tag 11, whichtransmits ID information specifying the person Pd as tag information viaultrasonic wave. When an ultrasonic wave transmitter as that generallyused for distance measurement is used, the detection range is about 20 mfrom the camera, although it varies with the transmission intensity ofthe ultrasonic wave.

The camera 50 includes two microphones 51 a and 51 b, a tagcommunication section 12A and a target detection section 14A. Themicrophones 51 a and 51 b receive ultrasonic wave transmitted by theinformation tag 11. The tag communication section 12A obtains IDinformation from the ultrasonic signal received by the microphones 51 aand 51 b, and also computes the direction and distance of theinformation tag 11 with respect to the camera 50. The direction of theinformation tag 11 with respect to the camera 50 can be estimated fromthe time difference (phase difference) or the intensity ratio betweenultrasonic signals received by the two microphones 51 a and 51 b. Thedistance from the information tag 11 can be estimated from the degree ofattenuation (degree at which the intensity and the waveform dulls) ofthe received ultrasonic signals.

FIG. 21 is a block diagram showing a configuration example of the camera50 as the object detection equipment. Referring to FIG. 21, the tagcommunication section 12A is essentially composed of the microphones 51a and 51 b, a distance determination portion 52, a time differencecomputation portion 53, a direction determination portion 54 and an IDextraction portion 55. The object detection section 14A is essentiallycomposed of an image coordinate determination portion 56 and an imagesegmentation portion 142. The two microphones 51 a and 51 b are placedat positions different in the horizontal direction from each other.

The distance determination portion 52 computes the distance from theinformation tag 11 based on the intensity of ultrasonic waves receivedby the microphones 51 a and 51 b. The time difference computationportion 53 computes the difference in detection time between ultrasonicsignals received by the microphones 51 a and 51 b. The directiondetermination portion 54 computes the direction of the information tag11 with respect to the camera 50 (direction in the horizontal planeincluding the microphones 51 a and 51 b) based on the detection timedifference computed by the time difference computation portion 53.Assume that the direction determination portion 54 holds thecorrespondence between the detection time difference and the direction.

The image coordinate determination portion 56 determines the position ofthe information tag 11 in the horizontal direction in an image based onthe direction of the information tag 11 obtained by the directiondetermination portion 54 and the lens focal distance (degree of zooming)at the imaging section 13 sent from a camera control section 57. Thisprocessing is substantially the same as the processing of associatingthe position information of an information tag with position coordinatesin an image, described in Embodiment 1.

The ID extraction portion 55 extracts ID information from the ultrasonicsignals received by the microphones 51 a and 51 b. The attribute lookupsection 15 reads the height of the person Pd as the detection objectfrom the attribute storage section 16 using the ID information. Atemplate generation part 143 of the image segmentation portion 142generates a shape template reflecting the size of the person Pd in animage based on the distance obtained by the distance determinationportion 52, the height of the person Pd obtained by the attribute lookupsection 15 and the lens focal distance sent from the camera controlsection 57. A template matching part 144 performs matching using thetemplate generated by the template generation part 143 in an area of animage at and around the position of the information tag 11, to detectthe position of the person Pd. The position information of the person Pddetected is given to the camera control section 57. Using the positioninformation, the camera control section 57 controls the imaging section13 to perform more accurate focus adjustment, exposure adjustment, colorcorrection, zoom adjustment and the like.

A flow of the processing in this embodiment will be described withreference to the flowchart of FIG. 22, taking the situation shown inFIG. 20 as an example.

First, the information tag 11 carried by the person Pd as the subject orthe detection target transmits ID information via an ultrasonic signal(T1). The microphones 51 a and 51 b of the camera 50 receives thetransmitted ultrasonic signal (T2). A difference in the reception timeof the ultrasonic signal between the microphones 51 a and 51 b iscomputed (T3), and the direction 0 of the information tag 11 withrespect to the camera 50 is computed from the reception time difference(T4).

The distance D from the information tag 11 is computed from theintensity of the ultrasonic signal received by the microphones 51 a and51 b (T5). At this time, whether or not the position in the direction θfalls within the image taken is determined considering the zoommagnification at the imaging section 13 (T6). If determined that itfails to fall within the image (NO in T6), the zoom magnification islowered so that the position falls within the image, the orientation ofthe imaging section 13 is turned toward the direction θ (if the imagingsection 13 is mounted on a movable pan head), or a message is displayedon a monitor of the camera 50 notifying the user that the detectiontarget is outside the coverage or on which side of the image thedetection target exists, for example, to urge the user to turn theorientation of the camera 50 (T7). In this relation, recording may beautomatically halted as long as it is determined that the target isoutside the coverage, and may be automatically started once the targetfalls within the coverage.

If it is determined that the position in the direction θ falls withinthe image (YES in T6), the focus adjustment of the camera is performedaccording to the distance D (T8). Assume that an image as shown in FIG.23 is obtained by this adjustment. The position (area) L1 of theinformation tag 11 in the image is then determined based on the zoommagnification and the direction θ (T9). Note however that when thedirection is computed with an ultrasonic signal, an error may occur dueto influence of the temperature, wind, reflection from a surroundingobject, noise and the like. Therefore, it is difficult to limit the areaso narrowly as to be able to specify a single person. In the exampleshown in FIG. 23, not only the person Pd to be detected but also aperson Pe is included in the area L1.

ID information is extracted from the ultrasonic signal (T10), and usingthe ID information, the height H of the target is acquired (T11). Ashape template T corresponding to the size of the target supposed to begiven on the image is generated based on the height H, the distance Dand the zoom magnification (T12). FIG. 24 shows an example of thetemplate T generated in this way. With the template T as shown in FIG.24, matching is performed for an area of the position L1 and itssurroundings in the image. The position giving the highest degree ofmatching is determined as an accurate position L2 (T13). In thismatching, only the person Pd is detected because the person Pe isdifferent in the size on the image from the person Pd. The position L2is displayed on the image, to allow the user to easily adjust theorientation of the camera and the like as required. Also, adjustment ofthe aperture and exposure, color correction, focus adjustment and thelike of the imaging section 13 are performed according to the color,brightness and quality of the area of the position L2 and itssurroundings (T14). In this way, even a user unfamiliar to photographingcan take images of the person Pd as the subject accurately.

The step T5 may be executed at any time after the step T2 before thedistance D is used. The step T10 may be executed at any time from thestep T2 before the step T11.

As described above, in this embodiment, persons similar in the size onan image to each other (persons Pd and Pf) and persons close in positionto each other (persons Pd and Pe) can be distinguished from each othercorrectly. Accordingly, even in a situation including a number ofpersons, the position of the target can be accurately detected withoutlargely increasing the processing amount. In addition, by using anultrasonic transmitter as the information tag, the direction anddistance can be advantageously computed with a simple and inexpensivesystem.

In the example described above, two microphones were used. The number ofmicrophones is not limited to this, but may be three or more. If threeor more microphones are used, it is possible to compute the direction ofthe information tag using a plurality of combinations of any twomicrophones and average the computation results, to thereby improve theprecision of the direction computation.

The camera may trigger the information tag to transmit ultrasonic waveby use of ultrasonic wave, radio wave, light or the like. In this case,the time taken from the triggering until reception of an ultrasonicsignal may be measured, to compute the distance from the information tagbased on the measured time and the sonic speed.

The detection processing described above may be performed only whenspecific ID information is obtained. This enables detection of only atarget having a specific information tag in a situation of existence ofa plurality of information tags.

Both the information from the information tag and the image processingmay not necessarily be used at all times. For example, if a signal fromthe information tag is temporarily unreceivable due to influence ofnoise and the like, the reception state is determined bad, and theprocessing may be automatically switched to only detection processingwith images. In this case, a template used immediately before theswitching may be used. On the contrary, If the detection with a templatein an image temporarily fails due to influence of a change of sunlightand the like, failure of this detection is determined, and theprocessing may be automatically switched to detection with only theinformation on the direction and distance of the information tag. Inthis way, if one of the two types of information is unusable, theprocessing is automatically switched to use of only the other type ofinformation. This may lower the detection precision but preventscomplete loss of sight of the target, and thus object detectionequipment durable against a change in situation can be attained.

FIG. 25 is a flowchart of a procedure of switching of the processing.Referring to FIG. 25, first, whether or not an ultrasonic signal hasbeen normally received by the microphones is determined (K1). Ifnormally received (YES in K1), template matching is performed in an areaof an image at and around an estimated position of the information tagdetermined in the processing described above (K2). Whether or not themaximum of the degree of matching is equal to or more than apredetermined threshold is determined (K4). If it is equal to or morethan the threshold (YES), the position giving the maximum degree ofmatching is determined as the detected position of the person (K7). Ifit is less than the threshold (NO in K4), the position estimated fromthe ultrasonic signal is adopted as the detected position of the person(K6). Since this detection is low in reliability, a message notifyingthat the detection precision is low or that detection with an image isdifficult may be displayed on the monitor.

If normal reception of an ultrasonic signal fails in the step K1 (NO inK1), template matching is performed for the entire image (K3).Alternatively, template matching may be performed in an area at andaround the detected position at the preceding frame. Whether or not themaximum of the degree of matching is equal to or more than apredetermined threshold is determined (K5). If it is equal to or morethan the threshold (YES), the position giving the maximum degree ofmatching is determined as the detected position of the person (K9). Inthis case, also, since the detection is low in reliability, a messagenotifying that the detection precision is low or that detection withultrasonic wave is difficult may be displayed on the monitor. If it isless than the threshold (NO in K5), failure of position detection isdetermined. This is displayed on the monitor, or the detection positiondetermined at the preceding frame is adopted (K8).

In the example described above, the detection procedure is switched whenone of the information types is unusable. The following techniques mayalso be adopted to change the detection procedure.

When there exist a plurality of types of ID information obtained frominformation tags, for example, the range of an image within which thematching is performed may be made wider compared with the case that asingle type of ID information is obtained. By this setting, occurrenceof miss detection can be suppressed even in the case that interferenceof ultrasonic signals occurs due to existence of a plurality oftransmitting sources and this degrades the precision of positiondetection with an ultrasonic signal.

When there exist a plurality of positions giving high degrees ofmatching, the setting may be changed to determine the position of aperson obtained from an information tag as the detected position. Thiscan suppress occurrence of erroneously detecting a wrong person whenvery similar persons exist close to each other and causing frequentdisplacement of the detection position.

Part or the entire of the processing performed by the object detectionequipment of the present invention may be performed by exclusiveequipment, or may be implemented as a processing program executed by aCPU incorporated in a computer. Alternatively, as in the monitoringcenter 31 shown in FIG. 5, the object detection server may receive animage taken by the imaging section and tag information transmitted froman information tag attached to a given target, and detect the giventarget in the image.

As described above, according to the present invention, by using animage and tag information transmitted from an information tag attachedto a given target in integration, the target can be detected accuratelyin the image and the posture, motion and the like of the target can bespecified, even in a place in which the lighting condition greatlychanges, such as outdoors, and in a situation in which a plurality ofpersons and objects exist. Moreover, the processing amount can besuppressed from increasing, and thus the processing time and cost can besignificantly reduced compared with the case of performing only imageprocessing.

1. An object detection equipment comprising: an imaging section fortaking an image; a tag communication section for receiving taginformation transmitted from an information tag attached to a giventarget; and a target detection section for detecting the given target inthe image taken by the imaging section by acquiring position informationof the information tag in the image using the tag information receivedby the tag communication section and by performing image processing byreferring to the acquired position information.
 2. The object detectionequipment of claim 1, wherein the tag information includes attributeinformation representing an attribute of the given target, and thetarget detection section performs the detection using the attributeinformation included in the tag information received by the tagcommunication section.
 3. The object detection equipment of claim 1,wherein the tag information includes ID information of the given target,the object detection equipment further comprises: an attribute storagesection for storing a correspondence between ID information andattribute information; and an attribute lookup section for looking upcontents of the attribute storage section using the ID informationincluded in the tag information received by the tag communication toobtain the attribute information of the given target. and the targetdetection section performs the detection using the attribute informationobtained by the attribute lookup section.
 4. The object detectionequipment of claim 1, wherein the target detection section comprises: animage segmentation portion for determining a partial image area havingthe possibility of including the given target in the image by acquiringposition information of the information tag in the image using the taginformation and by referring to the acquired position information; andan image recognition portion for detecting the given target in thepartial image area determined by the image segmentation portion.
 5. Theobject detection equipment of claim 1, wherein the tag informationincludes position information representing a position of the informationtag, and the target detection section performs the detection byreferring to the position information included in the tag informationreceived by the tag communication section.
 6. The object detectionequipment of claim 1, wherein the tag communication section estimates aposition of the information tag from a state of reception of the taginformation, and the target detection section performs the detection byreferring to the position estimated by the tag communication section. 7.The object detection equipment of claim 1, wherein the tag informationincludes a detection procedure for the given target, and the targetdetection section performs the detection by executing the detectionprocedure included in the tag information received by the tagcommunication section.
 8. The object detection equipment of claim 1,wherein the target detection section performs the detection with onlyimage processing without use of the tag information when a receptionstate of the tag communication section is bad.
 9. An object detectionserver for receiving an image taken by an imaging section and taginformation transmitted from an information tag attached to a giventarget, and detecting the given target in the image by acquiringposition information of the information tag in the image using the taginformation and by performing image processing by referring to theacquired position information.
 10. An object detection method comprisingthe steps of: receiving an image taken by an imaging section; receivingtag information transmitted from an information tag attached to a giventarget; and detecting the given target in the image by acquiringposition information of the information tag in the image using the taginformation and by performing image processing by referring to theacquired position information.
 11. The object detection server of claim9, wherein in detecting the given target, a partial image area havingthe possibility of including the given target is determined in the imageby acquiring position information of the information tag in the imageusing the tag information and by referring to the acquired positioninformation; and the given target is detected in the determined partialimage area.
 12. The object detection method of claim 10, wherein thedetecting step comprises: an image segmentation step of determining apartial image area having the possibility of including the given targetin the image by acquiring position information of the information tag inthe image using the tag information and by referring to the acquiredposition information; and an image recognition step of detecting thegiven target in the partial image area determined in the imagesegmentation step.
 13. The object detection equipment of claim 1,wherein the target detection section links in advance positioncoordinates in a three-dimensional space in which the given targetexists with coordinates in the image and refers to the linking toacquire position information of the information tag in the image usingthe tag information.